970 resultados para Statistical measures


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Several statistical downscaling models have been developed in the past couple of decades to assess the hydrologic impacts of climate change by projecting the station-scale hydrological variables from large-scale atmospheric variables simulated by general circulation models (GCMs). This paper presents and compares different statistical downscaling models that use multiple linear regression (MLR), positive coefficient regression (PCR), stepwise regression (SR), and support vector machine (SVM) techniques for estimating monthly rainfall amounts in the state of Florida. Mean sea level pressure, air temperature, geopotential height, specific humidity, U wind, and V wind are used as the explanatory variables/predictors in the downscaling models. Data for these variables are obtained from the National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP-NCAR) reanalysis dataset and the Canadian Centre for Climate Modelling and Analysis (CCCma) Coupled Global Climate Model, version 3 (CGCM3) GCM simulations. The principal component analysis (PCA) and fuzzy c-means clustering method (FCM) are used as part of downscaling model to reduce the dimensionality of the dataset and identify the clusters in the data, respectively. Evaluation of the performances of the models using different error and statistical measures indicates that the SVM-based model performed better than all the other models in reproducing most monthly rainfall statistics at 18 sites. Output from the third-generation CGCM3 GCM for the A1B scenario was used for future projections. For the projection period 2001-10, MLR was used to relate variables at the GCM and NCEP grid scales. Use of MLR in linking the predictor variables at the GCM and NCEP grid scales yielded better reproduction of monthly rainfall statistics at most of the stations (12 out of 18) compared to those by spatial interpolation technique used in earlier studies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A statistical modeling approach is proposed for use in searching large microarray data sets for genes that have a transcriptional response to a stimulus. The approach is unrestricted with respect to the timing, magnitude or duration of the response, or the overall abundance of the transcript. The statistical model makes an accommodation for systematic heterogeneity in expression levels. Corresponding data analyses provide gene-specific information, and the approach provides a means for evaluating the statistical significance of such information. To illustrate this strategy we have derived a model to depict the profile expected for a periodically transcribed gene and used it to look for budding yeast transcripts that adhere to this profile. Using objective criteria, this method identifies 81% of the known periodic transcripts and 1,088 genes, which show significant periodicity in at least one of the three data sets analyzed. However, only one-quarter of these genes show significant oscillations in at least two data sets and can be classified as periodic with high confidence. The method provides estimates of the mean activation and deactivation times, induced and basal expression levels, and statistical measures of the precision of these estimates for each periodic transcript.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Design Systematic review. Data sources The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. Selection criteria For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. Methods The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Results Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. Conclusions The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present the results of our detailed pseudospectral direct numerical simulation (DNS) studies, with up to 1024(3) collocation points, of incompressible, magnetohydrodynamic (MHD) turbulence in three dimensions, without a mean magnetic field. Our study concentrates on the dependence of various statistical properties of both decaying and statistically steady MHD turbulence on the magnetic Prandtl number Pr-M over a large range, namely 0.01 <= Pr-M <= 10. We obtain data for a wide variety of statistical measures, such as probability distribution functions (PDFs) of the moduli of the vorticity and current density, the energy dissipation rates, and velocity and magnetic-field increments, energy and other spectra, velocity and magnetic-field structure functions, which we use to characterize intermittency, isosurfaces of quantities, such as the moduli of the vorticity and current density, and joint PDFs, such as those of fluid and magnetic dissipation rates. Our systematic study uncovers interesting results that have not been noted hitherto. In particular, we find a crossover from a larger intermittency in the magnetic field than in the velocity field, at large Pr-M, to a smaller intermittency in the magnetic field than in the velocity field, at low Pr-M. Furthermore, a comparison of our results for decaying MHD turbulence and its forced, statistically steady analogue suggests that we have strong universality in the sense that, for a fixed value of Pr-M, multiscaling exponent ratios agree, at least within our error bars, for both decaying and statistically steady homogeneous, isotropic MHD turbulence.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Identifying protein-protein interactions is crucial for understanding cellular functions. Genomic data provides opportunities and challenges in identifying these interactions. We uncover the rules for predicting protein-protein interactions using a frequent pattern tree (FPT) approach modified to generate a minimum set of rules (mFPT), with rule attributes constructed from the interaction features of the yeast genomic data. The mFPT prediction accuracy is benchmarked against other commonly used methods such as Bayesian networks and logistic regressions under various statistical measures. Our study indicates that mFPT outranks other methods in predicting the protein-protein interactions for the database used. We predict a new protein-protein interaction complex whose biological function is related to premRNA splicing and new protein-protein interactions within existing complexes based on the rules generated.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pakiet oprogramowania InfoCult™ Analyser 1.4 stanowiącego obudowę książki do pobrania ze strony: ewaluacja.amu.edu.pl

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The last 30 years have seen Fuzzy Logic (FL) emerging as a method either complementing or challenging stochastic methods as the traditional method of modelling uncertainty. But the circumstances under which FL or stochastic methods should be used are shrouded in disagreement, because the areas of application of statistical and FL methods are overlapping with differences in opinion as to when which method should be used. Lacking are practically relevant case studies comparing these two methods. This work compares stochastic and FL methods for the assessment of spare capacity on the example of pharmaceutical high purity water (HPW) utility systems. The goal of this study was to find the most appropriate method modelling uncertainty in industrial scale HPW systems. The results provide evidence which suggests that stochastic methods are superior to the methods of FL in simulating uncertainty in chemical plant utilities including HPW systems in typical cases whereby extreme events, for example peaks in demand, or day-to-day variation rather than average values are of interest. The average production output or other statistical measures may, for instance, be of interest in the assessment of workshops. Furthermore the results indicate that the stochastic model should be used only if found necessary by a deterministic simulation. Consequently, this thesis concludes that either deterministic or stochastic methods should be used to simulate uncertainty in chemical plant utility systems and by extension some process system because extreme events or the modelling of day-to-day variation are important in capacity extension projects. Other reasons supporting the suggestion that stochastic HPW models are preferred to FL HPW models include: 1. The computer code for stochastic models is typically less complex than a FL models, thus reducing code maintenance and validation issues. 2. In many respects FL models are similar to deterministic models. Thus the need for a FL model over a deterministic model is questionable in the case of industrial scale HPW systems as presented here (as well as other similar systems) since the latter requires simpler models. 3. A FL model may be difficult to "sell" to an end-user as its results represent "approximate reasoning" a definition of which is, however, lacking. 4. Stochastic models may be applied with some relatively minor modifications on other systems, whereas FL models may not. For instance, the stochastic HPW system could be used to model municipal drinking water systems, whereas the FL HPW model should or could not be used on such systems. This is because the FL and stochastic model philosophies of a HPW system are fundamentally different. The stochastic model sees schedule and volume uncertainties as random phenomena described by statistical distributions based on either estimated or historical data. The FL model, on the other hand, simulates schedule uncertainties based on estimated operator behaviour e.g. tiredness of the operators and their working schedule. But in a municipal drinking water distribution system the notion of "operator" breaks down. 5. Stochastic methods can account for uncertainties that are difficult to model with FL. The FL HPW system model does not account for dispensed volume uncertainty, as there appears to be no reasonable method to account for it with FL whereas the stochastic model includes volume uncertainty.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The operation of supply chains (SCs) has for many years been focused on efficiency, leanness and responsiveness. This has resulted in reduced slack in operations, compressed cycle times, increased productivity and minimised inventory levels along the SC. Combined with tight tolerance settings for the realisation of logistics and production processes, this has led to SC performances that are frequently not robust. SCs are becoming increasingly vulnerable to disturbances, which can decrease the competitive power of the entire chain in the market. Moreover, in the case of food SCs non-robust performances may ultimately result in empty shelves in grocery stores and supermarkets.
The overall objective of this research is to contribute to Supply Chain Management (SCM) theory by developing a structured approach to assess SC vulnerability, so that robust performances of food SCs can be assured. We also aim to help companies in the food industry to evaluate their current state of vulnerability, and to improve their performance robustness through a better understanding of vulnerability issues. The following research questions (RQs) stem from these objectives:
RQ1: What are the main research challenges related to (food) SC robustness?
RQ2: What are the main elements that have to be considered in the design of robust SCs and what are the relationships between these elements?
RQ3: What is the relationship between the contextual factors of food SCs and the use of disturbance management principles?
RQ4: How to systematically assess the impact of disturbances in (food) SC processes on the robustness of (food) SC performances?
To answer these RQs we used different methodologies, both qualitative and quantitative. For each question, we conducted a literature survey to identify gaps in existing research and define the state of the art of knowledge on the related topics. For the second and third RQ, we conducted both exploration and testing on selected case studies. Finally, to obtain more detailed answers to the fourth question, we used simulation modelling and scenario analysis for vulnerability assessment.
Main findings are summarised as follows.
Based on an extensive literature review, we answered RQ1. The main research challenges were related to the need to define SC robustness more precisely, to identify and classify disturbances and their causes in the context of the specific characteristics of SCs and to make a systematic overview of (re)design strategies that may improve SC robustness. Also, we found that it is useful to be able to discriminate between varying degrees of SC vulnerability and to find a measure that quantifies the extent to which a company or SC shows robust performances when exposed to disturbances.
To address RQ2, we define SC robustness as the degree to which a SC shows an acceptable performance in (each of) its Key Performance Indicators (KPIs) during and after an unexpected event that caused a disturbance in one or more logistics processes. Based on the SCM literature we identified the main elements needed to achieve robust performances and structured them together to form a conceptual framework for the design of robust SCs. We then explained the logic of the framework and elaborate on each of its main elements: the SC scenario, SC disturbances, SC performance, sources of food SC vulnerability, and redesign principles and strategies.
Based on three case studies, we answered RQ3. Our major findings show that the contextual factors have a consistent relationship to Disturbance Management Principles (DMPs). The product and SC environment characteristics are contextual factors that are hard to change and these characteristics initiate the use of specific DMPs as well as constrain the use of potential response actions. The process and the SC network characteristics are contextual factors that are easier to change, and they are affected by the use of the DMPs. We also found a notable relationship between the type of DMP likely to be used and the particular combination of contextual factors present in the observed SC.
To address RQ4, we presented a new method for vulnerability assessments, the VULA method. The VULA method helps to identify how much a company is underperforming on a specific Key Performance Indicator (KPI) in the case of a disturbance, how often this would happen and how long it would last. It ultimately informs the decision maker about whether process redesign is needed and what kind of redesign strategies should be used in order to increase the SC’s robustness. The VULA method is demonstrated in the context of a meat SC using discrete-event simulation. The case findings show that performance robustness can be assessed for any KPI using the VULA method.
To sum-up the project, all findings were incorporated within an integrated framework for designing robust SCs. The integrated framework consists of the following steps: 1) Description of the SC scenario and identification of its specific contextual factors; 2) Identification of disturbances that may affect KPIs; 3) Definition of the relevant KPIs and identification of the main disturbances through assessment of the SC performance robustness (i.e. application of the VULA method); 4) Identification of the sources of vulnerability that may (strongly) affect the robustness of performances and eventually increase the vulnerability of the SC; 5) Identification of appropriate preventive or disturbance impact reductive redesign strategies; 6) Alteration of SC scenario elements as required by the selected redesign strategies and repeat VULA method for KPIs, as defined in Step 3.
Contributions of this research are listed as follows. First, we have identified emerging research areas - SC robustness, and its counterpart, vulnerability. Second, we have developed a definition of SC robustness, operationalized it, and identified and structured the relevant elements for the design of robust SCs in the form of a research framework. With this research framework, we contribute to a better understanding of the concepts of vulnerability and robustness and related issues in food SCs. Third, we identified the relationship between contextual factors of food SCs and specific DMPs used to maintain robust SC performances: characteristics of the product and the SC environment influence the selection and use of DMPs; processes and SC networks are influenced by DMPs. Fourth, we developed specific metrics for vulnerability assessments, which serve as a basis of a VULA method. The VULA method investigates different measures of the variability of both the duration of impacts from disturbances and the fluctuations in their magnitude.
With this project, we also hope to have delivered practical insights into food SC vulnerability. First, the integrated framework for the design of robust SCs can be used to guide food companies in successful disturbance management. Second, empirical findings from case studies lead to the identification of changeable characteristics of SCs that can serve as a basis for assessing where to focus efforts to manage disturbances. Third, the VULA method can help top management to get more reliable information about the “health” of the company.
The two most important research opportunities are: First, there is a need to extend and validate our findings related to the research framework and contextual factors through further case studies related to other types of (food) products and other types of SCs. Second, there is a need to further develop and test the VULA method, e.g.: to use other indicators and statistical measures for disturbance detection and SC improvement; to define the most appropriate KPI to represent the robustness of a complete SC. We hope this thesis invites other researchers to pick up these challenges and help us further improve the robustness of (food) SCs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Lovastatin biosynthesis depends on the relative concentrations of dissolved oxygen and the carbon and nitrogen resources. An elucidation of the underlying relationship would facilitate the derivation of a controller for the improvement of lovastatin yield in bioprocesses. To achieve this goal, batch submerged cultivation experiments of lovastatin production by Aspergillus flavipus BICC 5174, using both lactose and glucose as carbon sources, were performed in a 7 liter bioreactor and the data used to determine how the relative concentrations of lactose, glucose, glutamine and oxygen affected lovastatin yield. A model was developed based on these results and its prediction was validated using an independent set of batch data obtained from a 15-liter bioreactor using five statistical measures, including the Willmott index of agreement. A nonlinear controller was designed considering that dissolved oxygen and lactose concentrations could be measured online, and using the lactose feed rate and airflow rate as process inputs. Simulation experiments were performed to demonstrate that a practical implementation of the nonlinear controller would result in satisfactory outcomes. This is the first model that correlates lovastatin biosynthesis to carbon-nitrogen proportion and possesses a structure suitable for implementing a strategy for controlling lovastatin production.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: Sleep-disordered breathing is a common and serious feature of many paediatric conditions and is particularly a problem in children with Down syndrome. Overnight pulse oximetry is recommended as an initial screening test, but it is unclear how overnight oximetry results should be interpreted and how many nights should be recorded.

METHODS: This retrospective observational study evaluated night-to-night variation using statistical measures of repeatability for 214 children referred to a paediatric respiratory clinic, who required overnight oximetry measurements. This included 30 children with Down syndrome. We measured length of adequate trace, basal SpO2, number of desaturations (>4% SpO2 drop for >10 s) per hour ('adjusted index') and time with SpO2<90%. We classified oximetry traces into normal or abnormal based on physiology.

RESULTS: 132 out of 214 (62%) children had three technically adequate nights' oximetry, including 13 out of 30 (43%) children with Down syndrome. Intraclass correlation coefficient for adjusted index was 0.54 (95% CI 0.20 to 0.81) among children with Down syndrome and 0.88 (95% CI 0.84 to 0.91) for children with other diagnoses. Negative predictor value of a negative first night predicting two subsequent negative nights was 0.2 in children with Down syndrome and 0.55 in children with other diagnoses.

CONCLUSIONS: There is substantial night-to-night variation in overnight oximetry readings among children in all clinical groups undergoing overnight oximetry. This is a more pronounced problem in children with Down syndrome. Increasing the number of attempted nights' recording from one to three provides useful additional clinical information.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A large number of urban surface energy balance models now exist with different assumptions about the important features of the surface and exchange processes that need to be incorporated. To date, no com- parison of these models has been conducted; in contrast, models for natural surfaces have been compared extensively as part of the Project for Intercomparison of Land-surface Parameterization Schemes. Here, the methods and first results from an extensive international comparison of 33 models are presented. The aim of the comparison overall is to understand the complexity required to model energy and water exchanges in urban areas. The degree of complexity included in the models is outlined and impacts on model performance are discussed. During the comparison there have been significant developments in the models with resulting improvements in performance (root-mean-square error falling by up to two-thirds). Evaluation is based on a dataset containing net all-wave radiation, sensible heat, and latent heat flux observations for an industrial area in Vancouver, British Columbia, Canada. The aim of the comparison is twofold: to identify those modeling ap- proaches that minimize the errors in the simulated fluxes of the urban energy balance and to determine the degree of model complexity required for accurate simulations. There is evidence that some classes of models perform better for individual fluxes but no model performs best or worst for all fluxes. In general, the simpler models perform as well as the more complex models based on all statistical measures. Generally the schemes have best overall capability to model net all-wave radiation and least capability to model latent heat flux.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A model for estimating the turbulent kinetic energy dissipation rate in the oceanic boundary layer, based on insights from rapid-distortion theory, is presented and tested. This model provides a possible explanation for the very high dissipation levels found by numerous authors near the surface. It is conceived that turbulence, injected into the water by breaking waves, is subsequently amplified due to its distortion by the mean shear of the wind-induced current and straining by the Stokes drift of surface waves. The partition of the turbulent shear stress into a shear-induced part and a wave-induced part is taken into account. In this picture, dissipation enhancement results from the same mechanism responsible for Langmuir circulations. Apart from a dimensionless depth and an eddy turn-over time, the dimensionless dissipation rate depends on the wave slope and wave age, which may be encapsulated in the turbulent Langmuir number La_t. For large La_t, or any Lat but large depth, the dissipation rate tends to the usual surface layer scaling, whereas when Lat is small, it is strongly enhanced near the surface, growing asymptotically as ɛ ∝ La_t^{-2} when La_t → 0. Results from this model are compared with observations from the WAVES and SWADE data sets, assuming that this is the dominant dissipation mechanism acting in the ocean surface layer and statistical measures of the corresponding fit indicate a substantial improvement over previous theoretical models. Comparisons are also carried out against more recent measurements, showing good order-of-magnitude agreement, even when shallow-water effects are important.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the database is initially transformed to Weka (3.6) and Chi-Square method is used for features section. After normalizing data, three classifiers were applied and efficiency of output is evaluated. Mainly, three classifiers are analyzed: Decision Tree, Naïve Bayes, K-Nearest Neighbour algorithm. Results show that each technique has its unique strength in realizing the objectives of the defined mining goals. Efficiency of Decision Tree and KNN was almost same but Naïve Bayes proved a comparative edge over others. Further sensitivity and specificity tests are used as statistical measures to examine the performance of a binary classification. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified while Specificity measures the proportion of negatives which are correctly identified. CRISP-DM methodology is applied to build the mining models. It consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis focuses on the distribution of income across income units, as defined by the Australian Bureau of Statistics, in Australia in 1986. An examination of the conceptual issues involved in analysing income distribution is followed by a description of the various statistical and normative inequality measures that may be used to determine the level of inequality. Previous Australian studies is reported on before analysing the 1986 Income Distribution Survey. The analysis focuses on the summary statistical measures of the Gini coefficient the coefficient of variation and the percentile shares. In addition, the contribution of income of various population sub-groups to overall inequality is examined to provide insight into the sources of inequality. To this end, the Gini coefficient is decomposed using a method developed by Fodder (1991), whereby the population is divided into a number of subgroups based on one socio-demographic characteristic at a time. The exact effects of a percentage change in income for a particular sub-group to overall inequality, as well as the elasticity of the Gini coefficient with respect to a sub-group can be computed. The decomposition is undertaken using both the unadjusted and the equivalent gross weekly income. Policy considerations and conclusions regarding the level of inequality as existed in 1986 are suggested in the final chapter.